Automatic Translation Error Analysis
نویسندگان
چکیده
We propose a method of automatic identification of various error types in machine translation output. The approach is mostly based on monolingual word alignment of the hypothesis and the reference translation. In addition to common lexical errors misplaced words are also detected. A comparison to manually classified MT errors is presented. Our error classification is inspired by that of Vilar (2006; [17]), although distinguishing some of their categories is beyond the reach of the current version of our system.
منابع مشابه
Towards Heterogeneous Automatic MT Error Analysis
This work studies the viability of performing heterogeneous automatic MT error analyses. Error analysis is, undoubtly, one of the most crucial stages in the development cycle of an MT system. However, often not enough attention is paid to this process. The reason is that performing an accurate error analysis requires intensive human labor. In order to speed up the error analysis process, we sug...
متن کاملAutomatic Error Analysis Based on Grammatical Questions
The present paper proposes automatic error analysis methods that use patterns representing grammatical check points. Our method is comparable to or slightly outperforms conventional methods for automatic evaluation metrics. Different from the conventional methods, our method enables error analysis for each grammatical check point. While our method does not depend on languages, we experimentally...
متن کاملTowards Automatic Error Analysis of Machine Translation Output
Evaluation and error analysis of machine translation output are important but difficult tasks. In this article, we propose a framework for automatic error analysis and classification based on the identification of actual erroneous words using the algorithms for computation of Word Error Rate (WER) and Position-independent word Error Rate (PER), which is just a very first step towards developmen...
متن کاملMorpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output
Evaluation of machine translation output is an important but difficult task. Over the last years, a variety of automatic evaluation measures have been studied, some of them like Word Error Rate (WER), Position Independent Word Error Rate (PER) and BLEU and NIST scores have become widely used tools for comparing different systems as well as for evaluating improvements within one system. However,...
متن کاملHjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output
We describe Hjerson, a tool for automatic classification of errors in machine translation output. The tool features the detection of five word level error classes: morphological errors, reodering errors, missing words, extra words and lexical errors. As input, the tool requires original full form reference translation(s) and hypothesis along with their corresponding base forms. It is also possi...
متن کاملProcessing of Swedish Compounds for Phrase-Based Statistical Machine Translation
We investigated the effects of processing Swedish compounds for phrase-based SMT between Swedish and English. Compounds were split in a pre-processing step using an unsupervised empirical method. After translation into Swedish, compounds were merged, using a novel merging algorithm. We investigated two ways of handling compound parts, by marking them as compound parts or by normalizing them to ...
متن کامل